Skip to content

Conversation

@iluise
Copy link
Collaborator

@iluise iluise commented Sep 19, 2025

Description

Add reader to retrieve CSV scores from files generated with Quaver. @Jubeku

quaver scores can be plotted in the FastEvaluation package by adding this in the config (you need to have them locally first):

  nhem_son2022_24h_ifs_oper_an:
    type: "csv"
    label: "IFS Quaver ERA5"
    csv_path: "./scores/scores_nhem_son2022_24h_ifs_oper_an.csv" 
    metrics_dir: "./scores/"
    metric: "rmse"
    region: "nhem"
    streams: 
      ERA5:
        channels: ["2t", "10ff", "q_850", "t_850", "z_500"]
        evaluation: 
          forecast_step: "all"
          sample: "all"

Issue Number

Closes #930

Is this PR a draft? Mark it as draft.

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

@iluise iluise self-assigned this Sep 19, 2025
@iluise iluise added the eval anything related to the model evaluation pipeline label Sep 19, 2025
@clessig clessig requested a review from tjhunter September 19, 2025 20:39
Copy link
Collaborator

@tjhunter tjhunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iluise I have not tried it but I trust you did. Approved, and here are a couple of comments (small)

self.csv_path = eval_cfg.get("csv_path")
assert self.csv_path is not None, "CSV path must be provided in the config."

self.data = pd.read_csv(self.csv_path, index_col=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing I would do is cast all the values to np.float32 (or float). pandas tries to be very clever and would for example use int32 if the data allows. I am not sure if xarray can deal with that later.

Copy link
Contributor

@grassesi grassesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you have a object that can be an instance of different subclasses and you want different behavior depending on the subclass, you should use polymorphism to implement it: Make a method with the same function signature in both subclasses and implement the different behavior there. The advantage is that the caller does not have to have any knowledge of the differnces (As you do here).

Comment on lines +73 to +75

data: pd.DataFrame | None # Data attributes (if specified)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class level attribute is only used for differentiating between Subclasses. Use polymorphism as I described below instead.

)
_logger.debug(f"Looking for: {score_path}")

if score_path.exists():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should be a method of Reader/CsvReader. That way you dont have to do the if reader.data is not None statement: It will already be included when the WeatherGenReader/CsvReader is instatiated. This way you also dont need the data class attribute

@github-project-automation github-project-automation bot moved this to In Progress in WeatherGen-dev Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval anything related to the model evaluation pipeline

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

implement CSVReader to read quaver scores in FastEvaluation package

4 participants